Welcome back to deep learning and as promised in the last video we want to go ahead and talk a bit about more sophisticated architectures than the residual networks that we've seen in the previous video.
Okay, what do I have for you? Well, of course we can use this recipe of the residual connections also with our inception network and then this leads to inception ResNet.
And you see that the idea of residual connections is so easy that you can very easily incorporate it into many of the other architectures and this is also why you know we present these couple of architectures here because they are important building blocks towards building really deep networks.
You can see here that the inception and ResNet architectures really help you also to build very powerful networks and I really like this plot because you can learn a lot from it.
So you see on the y-axis the performance in terms of top one accuracy and you see on the x-axis the number of operations.
So this is measured in gigaflops. Also, you see the number of parameters of the models indicated by the diameter of the circle.
Here you can see that VGG16 and VGG19 they are at the very far right so they are very computationally expensive and their performance is kind of good, but not as good as other models that we've seen here in this class.
You also see that AlexNet is on the bottom left so it doesn't have too many computations. Also in terms of parameters it's quite a bit large, but the performance is not too great.
And now you see if you do batch normalization network and network you get better and then there's the GoogleNet and ResNet 18 that have an increased top one accuracy.
And we see that we can now go ahead build deeper models, but not get too many new parameters and this helps us to build more effective and more performing networks.
Of course then after some time we also start increasing the parameter space and you can see that one of the best performances are here obtained with Inception V3 or Inception V4 networks or also ResNet 100.
Okay, well what are other recipes that can help you building better models?
One thing that we've seen quite successfully is increasing the width of the residual networks. So there's wide residual networks, they decrease the depth, but they increase the width of the residual blocks.
Then you also use dropout in these residual blocks and you can show that a 16 layer deep network with similar number of parameters can outperform a thousand layer deep network.
So here the power is not from depth, but from the residual connections and the width that's introduced.
There's also things like ResNet where all of the previous recipes have been built together.
It allows aggregated residual transformations.
So you can see that this is actually equivalent to early concatenation. So we can replace it with early concatenation and then the general idea is that you do group convolution.
So you have the input and output chains divided into groups and then the convolutions are performed separately within every group.
Now this has similar flops and number of parameters as a ResNet bottleneck block, but it's wider and it's a sparsely connected module.
So this is quite popular. Then of course you can combine this. So you have a ResNet of ResNets.
You can even build more residual connections in here with dense nets.
So here you try to connect almost everything with everything. You have densely connected convolutional neural networks.
It has feature propagation. It has feature reuse. It very much also alleviates the vanishing gradient problem.
And with up to 264 layers, you actually need one third less parameters for the same performance than ResNet due to the transition layers using one by one convolutions.
Also a very nice interesting idea that I would like to show you is the squeeze and excitation networks.
And this is the ImageNet challenge winner in 2017 and it had 2.3% top 5 error.
And the model now is that you want to explicitly model the channel interdependencies,
which means essentially that you have some channels that are more relevant depending on the content.
So the idea is if you have dark features, they will not be very interesting when you're trying to look at cars.
So how is this implemented? Well, we add a trainable module that allows the rescaling of the channels depending on the input.
So we have the maps, the feature maps here shown, and then we have the side branch.
And the side branch is mapping down only to a single dimension. And the single dimension is then multiplied with the different feature maps,
allowing for some feature maps to be suppressed depending on the input and other feature maps to be scaled up depending on the input.
Then we essentially squeeze, we compress each channel into one value by global average pooling.
So this is how we construct the feature importance. And then we excite, where we use fully connected layers in the sigmoid function in order to excite only the important ones.
And by the way, this is very similar to what we would be doing in gating in the long short term memory cells, which we'll talk about probably in one of the videos next week.
Then we scale. So we scale the input maps with the output. And then we can combine this, of course, with most other architectures with inception, with ResNet, with ResNect.
So plenty of different options that we could go for. And to be honest, I don't want to show you another architecture here.
What we'll do next time is we talk about learning architectures. So aren't there ways how to determine this automatically?
Wouldn't that be much more efficient? Well, stay tuned. I will tell you in the next video. Bye bye.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:07:34 Min
Aufnahmedatum
2020-10-12
Hochgeladen am
2020-10-12 18:26:25
Sprache
en-US
Deep Learning - Architectures Part 4
This video demonstrates the many uses of residual connections in deep networks from Inception-ResNet to Densenet.
For reminders to watch the new video follow on Twitter or LinkedIn.
Further Reading:
A gentle Introduction to Deep Learning